Analogies and Differences in the Photoactivation Mechanism of Bathy and Canonical Bacteriophytochromes Revealed by Multiscale Modeling

Bacteriophytochromes are light-sensing biological machines that switch between two photoreversible states, Pr and Pfr. Their relative stability is opposite in canonical and bathy bacteriophytochromes, but in both cases the switch between them is triggered by the photoisomerization of an embedded bilin chromophore. We applied an integrated multiscale strategy of excited-state QM/MM nonadiabatic dynamics and (QM/)MM molecular dynamics simulations with enhanced sampling techniques to the Agrobacterium fabrum bathy phytochrome and compared the results with those obtained for the canonical phytochrome Deinococcus radiodurans. Contrary to what recently suggested, we found that photoactivation in both phytochromes is triggered by the same hula-twist motion of the bilin chromophore. However, only in the bathy phytochrome, the bilin reaches the final rotated structure already in the first intermediate. This allows a reorientation of the binding pocket in a microsecond time scale, which can propagate through the entire protein causing the spine to tilt.


S1 Methods
A multiscale strategy has been applied for modeling the photoactivation mechanism of Agp2 (Fig. S1).First, we sampled the configurational space of the solvated dimeric system in the Pfr state through four µs-long MM MDs.Subsequently, after a refinement of the PES of the QM subsystem achieved with ps-long QM/MM MDs, we investigated the photochemical process with the surface hopping method. 1,2Then, to investigate the evolution of the photoproduct and to characterize the first (Lumi-F) intermediate, we used ground-state (QM/)MM MDs.Finally, we tried to reach the second (Meta-F) intermediate by using the Gaussian Accelerated Molecular Dynamics (GaMD) method.
To investigate the possibility of an ultrafast proton transfer, we tested both static and dynamic models.In the former, we performed both ground-and excited-state QM/MM optimizations, while in the latter we combined enhanced sampling techniques (OPES EXPLORE) with excited-state polarizable QM/MM MDs.

Sampling the Pfr state
The initial structural model of the Agp2 phytochrome in the resting (Pfr) state was generated starting from the 6G1Y 3 entry of the Protein Data Bank.MODELLER 4 was used to add the missing residues in the crystal structure, namely Met1, Ala2, Ser3, Thr4, Asp5, Thr81, Gly82, Arg83, and Thr84 for both chains A and B, Ser121 and Asp122 for chain A, and Thr85 for chain B. The structure was subjected to further loop refinement to improve global structure quality using From the photoproduct to the Lumi-F intermediate   the ModRefiner web server. 5,6][10][11] Partial charges of the BV were computed using the RESP protocol 12 on top of a B3LYP/6-311G(d,p) electrostatic potential calculation.
The chromophore and its bonding interactions with the covalently bonded cysteine residue (Cys13), were described with the GAFF force-field 13 .Our GAFF parametrization of BV was based on a careful choice of atom types to conform to the conjugation pattern of the chromophore.This force field for BV was already used in previous work on both the (resting) Pr and (active) Pfr states of the DrBph phytochrome; 14,15 the resulting ensembles successfully reproduced different spectroscopic properties of the system, indicating that GAFF (with our assignment of parameters) can indeed properly describe BV.The protein was described with the AMBER ff14SB forcefield, 16 while water molecules with the TIP3P model. 17The refined dimeric system (protein and crystallographic waters) was first subjected to an in vacuo minimization with a 10 kcal mol −1 Å −1 harmonic restraint on backbone atoms; then, the entire dimer was soaked in a truncated octahedron water box and electrically neutralized with 0.15M NaCl. 18The solvated system was subjected to two minimization cycles of 5000 steps each: 2500 steps using steepest descent and other 2500 steps using conjugate gradient.In the first cycle, we restrained backbone atoms with a 10 kcal mol −1 Å −1 harmonic potential.Then, the entire system was minimized without constraints.A 1 ns-long heating to 300 K in an NVT ensemble followed, restraining the movement of the chromophores, the protein backbone, and the crystal water oxygens with a 10 kcal mol −1 Å −2 harmonic potential.To assure a slow release of the restraints, we performed three different 2 ns-long NPT equilibration runs by gradually decreasing the restraints to 1 kcal mol −1 Å −2 .
The production run was carried out without any restraint for 2 µs in the NPT ensemble, of which we discarded the first 500 ns.Average-quantity analyses were made on the remaining 1.5 µs (15000 frames per trajectory).
All simulations were performed applying the particle mesh Ewald (PME) truncation method (with a short-range cut-off of 10 Å), an integration step of 2 fs, the SHAKE algorithm, a Langevin thermostat with a collision frequency of 1 ps −1 , and the Monte Carlo barostat for NPT simulations.[21][22][23] We performed a clustering analysis on 12000 structures extracted from the four MM MD replicas based on hydrogen bond donor-acceptor distances involving the D ring and the nearby residues (Fig. S2).The final dataset comprises 12000 structures per 8 distances.
To reduce the dimensionality of the problem, we have used a principal component analysis (PCA) based on intermolecular distances involving key residues in the chromophore-binding pocket.On top of that, we have applied a hierarchical clustering algorithm finally obtaining three clusters.In cluster 1, the D-ring carbonyl does not interact with a water molecule, but instead with Gln190.
In both clusters 0 and 2, the D-ring carbonyl interacts with (at least) one water molecule, but in cluster 0 there is no hydrogen bond between the Cprop group and water molecules (Fig. S2).

Nonadiabatic dynamics
We extracted ten representative configurations from the cluster analysis performed on the Pfr state (see Fig. S2) and used them as starting points for ground-state QM/MM-MD trajectories.
Each trajectory was run for 20 ps using a timestep ∆t=0.5 fs and the Bussi-Parrinello stochastic thermostat at 300K. .We reduced the original dimeric system by keeping only the residues (protein and water molecules) within 26 Å from the chromophore to keep the computational cost feasible.The part of protein and solvent at more than 22 Å away from the BV were kept frozen.The BV chromophore was treated with the multi-reference floating occupation molecular orbital-complete active space configuration interaction (FOMO-CASCI) method 24,25 in combination with the AM1 Hamiltonian. 26We used an active space of (6,6) combined with a gaussian width of 0.05 a.u.for FOMOs, as it has already been used and validated in previous work. 27The boundary between the covalently linked QM and MM parts was treated using a connection atom scheme previously used for semiempirical QM/MM approaches. 28According to this scheme, the C 3β carbon atom of the chromophore is the connection atom which behaves as a hydrogen atom in the QM part and as a normal carbon atom in the MM part.Conversely, the protein was described with the AMBER ff14SB force field and water molecules with TIP3P, as in the fully MM MDs.The QM/MM trajectories were performed with a modified version of the semiempirical program MOPAC2002, 29 interfaced with the TINKER software package. 30e last 15 ps of the ten ground-state QM/MM trajectories were used to extract the initial conditions (nuclear coordinates and momenta) for generating the swarm of non-adiabatic surface hopping trajectories.This sampling has been performed by taking into account both the excitation energy and the radiative transition probability (Fig. S3, S4). 25 .Since the oscillator strength for the S 0to-S 2 excitation is small and the first excited state is well separated from the second one, all the In all SH trajectories, the QM and the MM subsystems and their respective levels of theory were the same used in the ground-state QM/MM simulations.The local diabatization algorithm 24,25 was used for the integration of the time-dependent Schrödinger equation for the electrons, the timestep ∆t was reduced to 0.1 fs, and quantum decoherence was taken into account by applying an energy-based correction. 31For all non-adiabatic trajectories an energy conservation threshold of c.a. 5 kcal mol −1 was set.After a backward hop, we rescale the component of the velocity vector parallel to the non-adiabatic coupling vector between the old and the new active states.Regarding forward hops, the nuclear velocity vector is rescaled by keeping its direction unaltered.We forbid hops when the energy gap between electronic states S 0 and S 1 was larger than 0.5 eV.
To reduce the computational cost, trajectories were stopped when one of the following two criteria Density (a.u.) was fulfilled: (i) the trajectory has been running on the ground state surface for 1 ps; (ii) the total simulation time is larger than 20 ps.
Surface Hopping simulations were performed with a modified version of the semiempirical program MOPAC2002, 29 interfaced with the TINKER software package. 30l the figures, the analysis of the trajectories, and the fitting were performed with matplotlib 32 , in-house FORTRAN 90 scripts, SciPy 33 , and in-house Python scripts.
From the protoproduct to the Lumi-F Representative configurations of the photoproduct obtained from the last frames of the reactive SH trajectories, i.e., 1 ps after the S 1 →S 0 hop, were used to propagate a swarm of ground-state MM MD simulations in search of the Lumi-F intermediate.In these simulations, we reintroduced a full solvation of the protein using a truncated octahedron water box.Both QM/MM and MM MD simulations were performed.In the former case, an HF/AM1 approach has been used to describe the chromophore to avoid discontinuity with respect to the nonadiabatic SH simulations.
The solvatated system was subjected to 400 steps of energy minimization using steepest descent and other 600 steps using conjugate gradient.Then, a gradually heating to 300 K in a NVT ensemble in 20 ps followed, restraining the movement of the chromophore and the protein backbone with a 4 kcal mol −1 Å −2 harmonic potential.It followed a 30 ps NPT equilibration run with the same restraints on the chromophore and all the protein backbone.Sixty-three QM/MM trajectories were propagated in the NPT ensemble for 1.5 ns each and average-quantity analyses were made on 15000 frames per trajectory.

Sampling the Lumi-F
Finally, to extend the investigation of the intermediate evolution in the µs-timescale, we switched to a fully MM description using the same FF used for the sampling of the Pfr state.Ten independent replicas were run for 1 µs each and average-quantity analysis were made on 10000 frames per trajectory.
The solvated system was subjected to 400 steps of energy minimization using steepest descent and other 1600 steps using conjugate gradient.Then, a gradually heating to 300 K in a NVT ensemble in 400 ps followed, restraining the movement of the chromophore and the protein backbone with a 4 kcal mol −1 Å −2 harmonic potential.The same constraints were maintained for the following 1 ns-long equilibration step.The production run was carried out without any restraint for 1 µs in the NPT ensemble.
For both QM/MM and MM MDs, the protein and water were described with the same force field used in the excited state simulations.
All simulations were performed applying the particle mesh Ewald (PME) truncation method (with a short-range cut-off of 10Å), an integration step of 2 fs, the SHAKE algorithm, a Langevin thermostat with a friction coefficient of 1 ps −1 , and the Monte Carlo barostat for NPT simulations.All molecular dynamics simulations were performed with AMBER22 34,35 .

Towards the Meta-F
The parameters used for the Gaussian accelerated MD (GaMD) 36 were the same as those used for the unbiased MM MD.We have used a dual-boost protocol on both dihedral (σ D = 6.0 kcal mol −1 ) and nonbonded potential energy of the system (σ P = 6.0 kcal mol −1 ) and set the system threshold energy to the lower bound E = V max .All GaMD simulations followed the following protocol.First, we ran a 4 ns-long conventional MD without any boost potential.During the first 0.8 ns, no statistics were collected, i.e., the maximum (V max ), minimum (V min ), average (V avg ), and standard deviation (σ V ) of the system potential, while in the remaining 3.2 ns potential statistics are collected.Then, a 100 ns-long GaMD equilibration was conducted: during the first 3.2 ns, boost parameters were not updated, while in the remaining 96.8 ns, they were updated.Following this, a 2 µs-long GaMD production run was conducted where the boost was applied, and the parameters not updated.

S2 The proton transfer mechanism QM/MM optimizations
To investigate the hypothesis of a deprotonation of the NH group of D ring via the carbonyl group, we first performed both ground-and excited-state QM/MM optimizations of BV before (keto) and after (enol) the proton transfer from nitrogen to oxygen.We used six different configurations of the system, extracted from our clusters of the Pfr state, that differ for the presence/absence of water molecules close to the D-ring carbonyl (Fig. S8).We selected three configurations with one (or more) water molecules coordinated to the D-ring, and three configurations without such an interaction.
Since the process is expected to complete in 50 fs, 9 all the calculations were performed assuming that the protein is frozen in the initial configuration before the proton transfer.The QM subsystem is comprehensive of the BV chromophore, the sidechain of the covalently-bound cysteine residue, and an aspartate residue (Asp196), and all the water molecules within 3Å from the D-ring oxygen/nitrogen.The ground-state optimizations have been performed at ωB97XD/6-31G(d,p) level of theory whereas, to better describe the excited states, the basis set has been enlarged to include diffusion functions (6-31+G(d,p)).The frequency calculations to obtain the zero-point correction were computed at the same level of theory.All the calculations were performed with Gaussian16. 37e energy differences between the keto-like and enol-like forms obtained for all the selected configurations are shown in Tab.S1.

QM/MM MD simulations
Overall, we ran ten excited-state polarizable embedding QM/MM 38 MD simulations starting from 10 frames extracted from the three clusters of the plain MM MD simulations on the Pfr state.For each snapshot, we kept a shell containing only one monomer and solvent residues within 10 Å of the protein.Non-periodic boundary conditions were included by freezing all the residues beyond 6 Å from the protein (Fig. S5a).The BV chromophore and the sidechain of the covalently-bound cysteine residue were treated at the ωB97XD/6-31G(d,p) level of theory, whereas the remaining part of the system was described using the AMOEBA-BIO-09 force field. 39The latter employs fixed charges, dipoles, and quadrupoles to describe the electrostatics and an induced-point-dipole (IPD) model to describe polarization. 40,41To generate suitable starting conditions for the MD simulations, on each of the structures, we first performed a QM/AMOEBA geometry optimization using the limited-memory Broyden-Fletcher-Goldfarb-Shannon algorithm 42 and then an 8 ps-long ground-state equilibration in the NVT ensemble using an integration step of 0.5 fs and the Velocity Verlet algorithm.A constant temperature of 300 K was kept using the Bussi thermostat 43 with a time constant of 0.1 ps.For all trajectories, a 12.0 Å cutoff was applied for the Van der Waals interactions.All QM/AMOEBA simulations were carried out using an interface [44][45][46] between the MD package Tinker 47,48 and a locally modified version of the Gaussian suite of programs. 49om these ground-state simulations, we extracted the ten structures for the excited-state QM/AMOEBA MD simulations in the NVT ensemble.To accelerate the sampling, we profit from the flexibility and efficiency of the On-the-fly Probability Enhanced Sampling (OPES) method in its EXPLORE flavour. 50,51The OPES parameters used were 45 kcal mol −1 for the energy barrier, and 50 steps for the MD pace (25 fs).The number of MD steps for measuring adaptive sigma was set to 150.3][54] The collective variable (CV) chosen to represent the process is the difference in the proton distance from the D-ring nitrogen and oxygen.The trend of the CV for all the trajectories is shown in Fig. S6.
The obtained results fully confirm what was found with the static picture, namely an extremely unfavorable proton transfer process (Fig. S5c).As a matter of fact, some trajectories did not even show a transition to the enol-like state, despite the use of a very high BARRIER parameter in the OPES EXPLORE simulations.

Figure S1 :
Figure S1: Multiscale strategy applied for modeling the photoactivation mechanism of Agp2.

Figure
FigureS2: A) Representation of the BV chromophore with their nearby residues.B) Clustering visualized in the PCA space.Points differently colored correspond to different clusters obtained with an agglomerative clustering: cluster 0 (Cl0) in green, cluster 1 (Cl1) in yellow, and cluster 2 (Cl2) in red).The density distribution is shown for both PCA 1 and PCA 2. C) KDE distribution of the distances used in the PCA analysis that separate the three clusters.

Figure S3 :Figure S4 :
FigureS3: Distribution of the energy difference between S 0 and S 1 and between S 0 and S 2 for each replica.

Figure
Figure S5: a) Depiction of the reduced system used for the QM/MMPol MD simulations.In gray are represented all the frozen residues; in red and violet, the sodium and chloride ions, respectively; in green, the BV chromophore.b) Trend of the collective variable used in one biased QM/MMPol simulation with the respective free energy (c).

Figure S6 :Figure
Figure S6: Trend of the collective variable in all QM/MMPol biased simulations

Figure S8 :Figure
Figure S8: Distributions of the main chromophore-protein interactions along the four Pfr MM MDs.

Figure S11 :
Figure S11: Time evolution of the electronic state populations, evaluated as the fraction of trajectories running in the given state at the given time, with their 95 % confidence intervals.Black line: fit of the S 1 population according to the following biexponential function A 1 exp −t/t 1 + A 2 exp −t/t 2 .A 1 = 0.34, A 2 = 0.74, t 1 = 4.25, t 2 = 0.71.

Figure S12 :
Figure S12: Trend of the dihedral D5 for each QM/MM MDs.

Figure S13 :
Figure S13: KDE distributions for the dihedrals D1−D6.In black, blue, and orange are represented the distributions of the Pfr state, after the SH simulations, and after 1.5 ns-long QM/MM MDs, respectively.

Figure S14 :
Figure S14: Trend of the dihedral D5 for each MM MDs.

Figure S15 :Figure S16 :
Figure S15: HB distance between the D-ring carbonyl and the His278 or Tyr165 residues.In bold, a moving average with a window of 20 frames.

Figure
Figure S20: a) Trend of the hydrogen bonding distance between the carbonyl group and Tyr165 or His278 along the two GaMD MDs.b) Trend of the dihedral D5 along the two GaMD MDs.

Table S1 :
Energy difference (enol − keto) in the ground and excited states in kcal mol −1 .Structures I to III are characterized by the absence of water molecules within hydrogen bonding distance (3.2Å) of the D-ring oxygen/nitrogen, while structures IV to VI have at least one water molecule within 3.2Å of the same atoms.In structures I Asp and IV Asp , we used the same structures as in I and IV to simulate proton transfer to Asp196. 55 ).The tongue and the spine are highlighted in yellow and red, respectively.